Goto

Collaborating Authors

 llm item


A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data

arXiv.org Artificial Intelligence

Psychological assessments are dominated by rating scales, which cannot capture the nuance in natural language. Efforts to supplement them with qualitative text have relied on labelled datasets or expert rubrics, limiting scalability. We introduce a framewo rk that avoids this reliance: large language models (LLMs) score free - text responses with simple prompts to produce candidate LLM items, from which we retain those that yield the most test information when co - calibrated with a baseline scale. Using depress ion as a case study, we developed and tested the method in upper - secondary students (n=693) and a matched synthetic dataset (n=3,000). Results on held - out test sets show ed that augmenting a 19 - item scale with LLM items improved its precision, accuracy, and convergent validity. Further, the test information gain matched that of adding as many as 16 rating - scale items. This framework leverage s the increas ing availability of transcribed language to enhance psychometric measures, with applications in clinical h ealth and beyond.